Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 3498 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 232.4 KiB |
| Average record size in memory | 68.0 B |
Variable types
| Numeric | 11 |
|---|---|
| Categorical | 4 |
Model_year is highly correlated with Mileage | High correlation |
Kilometers is highly correlated with Model_year | High correlation |
Registration is highly correlated with df_index and 2 other fields | High correlation |
State is highly correlated with df_index and 2 other fields | High correlation |
Fuel_capacity is highly correlated with Company and 5 other fields | High correlation |
Price is highly correlated with Company and 2 other fields | High correlation |
df_index is highly correlated with Registration and 2 other fields | High correlation |
Company is highly correlated with Model_name and 4 other fields | High correlation |
Model_name is highly correlated with Company and 2 other fields | High correlation |
Fuel_Type is highly correlated with Mileage and 2 other fields | High correlation |
Mileage is highly correlated with Company and 5 other fields | High correlation |
Seating_capacity is highly correlated with Company and 2 other fields | High correlation |
City is highly correlated with df_index and 2 other fields | High correlation |
df_index is uniformly distributed | Uniform |
df_index has unique values | Unique |
City has 216 (6.2%) zeros | Zeros |
Reproduction
| Analysis started | 2022-12-05 10:19:38.141217 |
|---|---|
| Analysis finished | 2022-12-05 10:20:18.554720 |
| Duration | 40.41 seconds |
| Software version | pandas-profiling v3.3.0 |
| Download configuration | config.json |
| Distinct | 3498 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1749.829617 |
| Minimum | 0 |
|---|---|
| Maximum | 3500 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 27.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 174.85 |
| Q1 | 875.25 |
| median | 1749.5 |
| Q3 | 2624.75 |
| 95-th percentile | 3324.15 |
| Maximum | 3500 |
| Range | 3500 |
| Interquartile range (IQR) | 1749.5 |
Descriptive statistics
| Standard deviation | 1010.48178 |
|---|---|
| Coefficient of variation (CV) | 0.5774743837 |
| Kurtosis | -1.199726532 |
| Mean | 1749.829617 |
| Median Absolute Deviation (MAD) | 875 |
| Skewness | 0.0002647233067 |
| Sum | 6120904 |
| Variance | 1021073.427 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 2337 | 1 | < 0.1% |
| 2326 | 1 | < 0.1% |
| 2327 | 1 | < 0.1% |
| 2328 | 1 | < 0.1% |
| 2329 | 1 | < 0.1% |
| 2330 | 1 | < 0.1% |
| 2331 | 1 | < 0.1% |
| 2332 | 1 | < 0.1% |
| 2333 | 1 | < 0.1% |
| Other values (3488) | 3488 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 3500 | 1 | |
| 3499 | 1 | |
| 3498 | 1 | |
| 3497 | 1 | |
| 3496 | 1 | |
| 3495 | 1 | |
| 3494 | 1 | |
| 3493 | 1 | |
| 3492 | 1 | |
| 3491 | 1 |
| Distinct | 17 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.848198971 |
| Minimum | 0 |
|---|---|
| Maximum | 16 |
| Zeros | 2 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 4 |
| median | 8 |
| Q3 | 8 |
| 95-th percentile | 14 |
| Maximum | 16 |
| Range | 16 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 3.40802572 |
|---|---|
| Coefficient of variation (CV) | 0.4976528477 |
| Kurtosis | 0.1610136514 |
| Mean | 6.848198971 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 0.7479409017 |
| Sum | 23955 |
| Variance | 11.61463931 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=17)
| Value | Count | Frequency (%) |
| 8 | 1458 | |
| 4 | 961 | |
| 3 | 322 | 9.2% |
| 12 | 152 | 4.3% |
| 14 | 140 | 4.0% |
| 2 | 117 | 3.3% |
| 16 | 69 | 2.0% |
| 15 | 58 | 1.7% |
| 7 | 52 | 1.5% |
| 6 | 39 | 1.1% |
| Other values (7) | 130 | 3.7% |
| Value | Count | Frequency (%) |
| 0 | 2 | 0.1% |
| 1 | 32 | 0.9% |
| 2 | 117 | 3.3% |
| 3 | 322 | 9.2% |
| 4 | 961 | |
| 5 | 33 | 0.9% |
| 6 | 39 | 1.1% |
| 7 | 52 | 1.5% |
| 8 | 1458 | |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 16 | 69 | 2.0% |
| 15 | 58 | 1.7% |
| 14 | 140 | 4.0% |
| 13 | 19 | 0.5% |
| 12 | 152 | 4.3% |
| 11 | 8 | 0.2% |
| 10 | 35 | 1.0% |
| 9 | 1 | < 0.1% |
| 8 | 1458 | |
| 7 | 52 | 1.5% |
| Distinct | 618 |
|---|---|
| Distinct (%) | 17.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 283.2798742 |
| Minimum | 0 |
|---|---|
| Maximum | 617 |
| Zeros | 3 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 110 |
| median | 279 |
| Q3 | 458 |
| 95-th percentile | 574 |
| Maximum | 617 |
| Range | 617 |
| Interquartile range (IQR) | 348 |
Descriptive statistics
| Standard deviation | 186.8394191 |
|---|---|
| Coefficient of variation (CV) | 0.6595576889 |
| Kurtosis | -1.26594318 |
| Mean | 283.2798742 |
| Median Absolute Deviation (MAD) | 174 |
| Skewness | 0.09500963119 |
| Sum | 990913 |
| Variance | 34908.96855 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 553 | 120 | 3.4% |
| 458 | 101 | 2.9% |
| 46 | 98 | 2.8% |
| 9 | 92 | 2.6% |
| 283 | 85 | 2.4% |
| 17 | 68 | 1.9% |
| 172 | 47 | 1.3% |
| 108 | 39 | 1.1% |
| 51 | 38 | 1.1% |
| 279 | 37 | 1.1% |
| Other values (608) | 2773 |
| Value | Count | Frequency (%) |
| 0 | 3 | 0.1% |
| 1 | 1 | < 0.1% |
| 2 | 1 | < 0.1% |
| 3 | 3 | 0.1% |
| 4 | 21 | 0.6% |
| 5 | 6 | 0.2% |
| 6 | 2 | 0.1% |
| 7 | 2 | 0.1% |
| 8 | 2 | 0.1% |
| 9 | 92 |
| Value | Count | Frequency (%) |
| 617 | 1 | < 0.1% |
| 616 | 8 | |
| 615 | 2 | 0.1% |
| 614 | 1 | < 0.1% |
| 613 | 16 | |
| 612 | 2 | 0.1% |
| 611 | 1 | < 0.1% |
| 610 | 7 | |
| 609 | 8 | |
| 608 | 2 | 0.1% |
| Distinct | 15 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2017.276158 |
| Minimum | 2008 |
|---|---|
| Maximum | 2022 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.8 KiB |
Quantile statistics
| Minimum | 2008 |
|---|---|
| 5-th percentile | 2012 |
| Q1 | 2016 |
| median | 2018 |
| Q3 | 2019 |
| 95-th percentile | 2021 |
| Maximum | 2022 |
| Range | 14 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.53616854 |
|---|---|
| Coefficient of variation (CV) | 0.001257224267 |
| Kurtosis | 0.1523105267 |
| Mean | 2017.276158 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.7057338161 |
| Sum | 7056432 |
| Variance | 6.432150861 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=15)
| Value | Count | Frequency (%) |
| 2018 | 682 | |
| 2019 | 538 | |
| 2017 | 499 | |
| 2020 | 386 | |
| 2016 | 334 | |
| 2021 | 260 | 7.4% |
| 2014 | 244 | 7.0% |
| 2015 | 234 | 6.7% |
| 2013 | 112 | 3.2% |
| 2012 | 93 | 2.7% |
| Other values (5) | 116 | 3.3% |
| Value | Count | Frequency (%) |
| 2008 | 1 | < 0.1% |
| 2009 | 4 | 0.1% |
| 2010 | 45 | 1.3% |
| 2011 | 47 | 1.3% |
| 2012 | 93 | 2.7% |
| 2013 | 112 | 3.2% |
| 2014 | 244 | |
| 2015 | 234 | |
| 2016 | 334 | |
| 2017 | 499 |
| Value | Count | Frequency (%) |
| 2022 | 19 | 0.5% |
| 2021 | 260 | 7.4% |
| 2020 | 386 | |
| 2019 | 538 | |
| 2018 | 682 | |
| 2017 | 499 | |
| 2016 | 334 | |
| 2015 | 234 | 6.7% |
| 2014 | 244 | 7.0% |
| 2013 | 112 | 3.2% |
| Distinct | 3422 |
|---|---|
| Distinct (%) | 97.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 42222.7773 |
| Minimum | 269 |
|---|---|
| Maximum | 455601 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.8 KiB |
Quantile statistics
| Minimum | 269 |
|---|---|
| 5-th percentile | 8165.65 |
| Q1 | 22052.5 |
| median | 39245.5 |
| Q3 | 59328.75 |
| 95-th percentile | 87893.25 |
| Maximum | 455601 |
| Range | 455332 |
| Interquartile range (IQR) | 37276.25 |
Descriptive statistics
| Standard deviation | 25492.86696 |
|---|---|
| Coefficient of variation (CV) | 0.6037704904 |
| Kurtosis | 19.77983252 |
| Mean | 42222.7773 |
| Median Absolute Deviation (MAD) | 18320.5 |
| Skewness | 1.717933625 |
| Sum | 147695275 |
| Variance | 649886265.6 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 20314 | 3 | 0.1% |
| 42796 | 2 | 0.1% |
| 31511 | 2 | 0.1% |
| 11159 | 2 | 0.1% |
| 28809 | 2 | 0.1% |
| 7632 | 2 | 0.1% |
| 63497 | 2 | 0.1% |
| 19318 | 2 | 0.1% |
| 57801 | 2 | 0.1% |
| 28198 | 2 | 0.1% |
| Other values (3412) | 3477 |
| Value | Count | Frequency (%) |
| 269 | 1 | |
| 410 | 1 | |
| 1076 | 1 | |
| 1087 | 1 | |
| 1122 | 1 | |
| 1298 | 1 | |
| 1345 | 1 | |
| 1444 | 1 | |
| 1568 | 1 | |
| 1650 | 1 |
| Value | Count | Frequency (%) |
| 455601 | 1 | |
| 242614 | 1 | |
| 110457 | 1 | |
| 101526 | 1 | |
| 100170 | 1 | |
| 100162 | 1 | |
| 100003 | 1 | |
| 99957 | 1 | |
| 99890 | 1 | |
| 99854 | 1 |
Owner
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 198.3 KiB |
| 0 | |
|---|---|
| 1 | |
| 2 | 56 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 3498 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 2619 | |
| 1 | 823 | 23.5% |
| 2 | 56 | 1.6% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 2619 | |
| 1 | 823 | 23.5% |
| 2 | 56 | 1.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 2619 | |
| 1 | 823 | 23.5% |
| 2 | 56 | 1.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 3498 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 2619 | |
| 1 | 823 | 23.5% |
| 2 | 56 | 1.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3498 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 2619 | |
| 1 | 823 | 23.5% |
| 2 | 56 | 1.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3498 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 2619 | |
| 1 | 823 | 23.5% |
| 2 | 56 | 1.6% |
Transmission
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 198.3 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 3498 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 2861 | |
| 0 | 637 | 18.2% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1 | 2861 | |
| 0 | 637 | 18.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 2861 | |
| 0 | 637 | 18.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 3498 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 2861 | |
| 0 | 637 | 18.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3498 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 2861 | |
| 0 | 637 | 18.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3498 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 2861 | |
| 0 | 637 | 18.2% |
| Distinct | 318 |
|---|---|
| Distinct (%) | 9.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 147.0731847 |
| Minimum | 0 |
|---|---|
| Maximum | 317 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 20 |
| Q1 | 77 |
| median | 158 |
| Q3 | 222 |
| 95-th percentile | 301 |
| Maximum | 317 |
| Range | 317 |
| Interquartile range (IQR) | 145 |
Descriptive statistics
| Standard deviation | 90.45561488 |
|---|---|
| Coefficient of variation (CV) | 0.6150381191 |
| Kurtosis | -1.139576989 |
| Mean | 147.0731847 |
| Median Absolute Deviation (MAD) | 72 |
| Skewness | 0.1905185699 |
| Sum | 514462 |
| Variance | 8182.218263 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 86 | 145 | 4.1% |
| 63 | 112 | 3.2% |
| 285 | 108 | 3.1% |
| 166 | 106 | 3.0% |
| 119 | 97 | 2.8% |
| 88 | 94 | 2.7% |
| 160 | 85 | 2.4% |
| 87 | 84 | 2.4% |
| 28 | 81 | 2.3% |
| 158 | 80 | 2.3% |
| Other values (308) | 2506 |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 1 | 2 | 0.1% |
| 2 | 6 | |
| 3 | 2 | 0.1% |
| 4 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 7 | 4 | |
| 8 | 6 | |
| 9 | 2 | 0.1% |
| Value | Count | Frequency (%) |
| 317 | 1 | < 0.1% |
| 316 | 2 | 0.1% |
| 315 | 2 | 0.1% |
| 314 | 1 | < 0.1% |
| 313 | 1 | < 0.1% |
| 312 | 11 | |
| 311 | 19 | |
| 310 | 1 | < 0.1% |
| 309 | 13 | |
| 308 | 6 | 0.2% |
| Distinct | 15 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.267295597 |
| Minimum | 0 |
|---|---|
| Maximum | 14 |
| Zeros | 33 |
| Zeros (%) | 0.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 6 |
| Q3 | 10 |
| 95-th percentile | 14 |
| Maximum | 14 |
| Range | 14 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 4.057602072 |
|---|---|
| Coefficient of variation (CV) | 0.647424716 |
| Kurtosis | -0.9374076627 |
| Mean | 6.267295597 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.5048307918 |
| Sum | 21923 |
| Variance | 16.46413458 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=15)
| Value | Count | Frequency (%) |
| 4 | 732 | |
| 6 | 615 | |
| 1 | 413 | |
| 10 | 374 | |
| 13 | 256 | 7.3% |
| 3 | 234 | 6.7% |
| 2 | 211 | 6.0% |
| 14 | 194 | 5.5% |
| 12 | 157 | 4.5% |
| 9 | 125 | 3.6% |
| Other values (5) | 187 | 5.3% |
| Value | Count | Frequency (%) |
| 0 | 33 | 0.9% |
| 1 | 413 | |
| 2 | 211 | 6.0% |
| 3 | 234 | 6.7% |
| 4 | 732 | |
| 5 | 54 | 1.5% |
| 6 | 615 | |
| 7 | 98 | 2.8% |
| 8 | 1 | < 0.1% |
| 9 | 125 | 3.6% |
| Value | Count | Frequency (%) |
| 14 | 194 | 5.5% |
| 13 | 256 | |
| 12 | 157 | 4.5% |
| 11 | 1 | < 0.1% |
| 10 | 374 | |
| 9 | 125 | 3.6% |
| 8 | 1 | < 0.1% |
| 7 | 98 | 2.8% |
| 6 | 615 | |
| 5 | 54 | 1.5% |
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 198.3 KiB |
| 1 | |
|---|---|
| 0 | |
| 2 | 128 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 3498 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 2 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 2923 | |
| 0 | 447 | 12.8% |
| 2 | 128 | 3.7% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1 | 2923 | |
| 0 | 447 | 12.8% |
| 2 | 128 | 3.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 2923 | |
| 0 | 447 | 12.8% |
| 2 | 128 | 3.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 3498 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 2923 | |
| 0 | 447 | 12.8% |
| 2 | 128 | 3.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3498 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 2923 | |
| 0 | 447 | 12.8% |
| 2 | 128 | 3.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3498 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 2923 | |
| 0 | 447 | 12.8% |
| 2 | 128 | 3.7% |
| Distinct | 138 |
|---|---|
| Distinct (%) | 3.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20.24419668 |
| Minimum | 10.3 |
|---|---|
| Maximum | 35.6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 27.5 KiB |
Quantile statistics
| Minimum | 10.3 |
|---|---|
| 5-th percentile | 15.1 |
| Q1 | 18.2 |
| median | 20.2 |
| Q3 | 22 |
| 95-th percentile | 25 |
| Maximum | 35.6 |
| Range | 25.3 |
| Interquartile range (IQR) | 3.8 |
Descriptive statistics
| Standard deviation | 3.261909491 |
|---|---|
| Coefficient of variation (CV) | 0.1611281269 |
| Kurtosis | 1.805088027 |
| Mean | 20.24419668 |
| Median Absolute Deviation (MAD) | 1.9 |
| Skewness | 0.6259047211 |
| Sum | 70814.2 |
| Variance | 10.64005353 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 21.4 | 167 | 4.8% |
| 19.8 | 154 | 4.4% |
| 18.6 | 142 | 4.1% |
| 20.5 | 141 | 4.0% |
| 18.9 | 114 | 3.3% |
| 22 | 100 | 2.9% |
| 24.7 | 100 | 2.9% |
| 21.2 | 84 | 2.4% |
| 18 | 81 | 2.3% |
| 23 | 77 | 2.2% |
| Other values (128) | 2338 |
| Value | Count | Frequency (%) |
| 10.3 | 1 | < 0.1% |
| 10.4 | 1 | < 0.1% |
| 10.8 | 2 | 0.1% |
| 10.9 | 1 | < 0.1% |
| 11 | 1 | < 0.1% |
| 11.6 | 3 | 0.1% |
| 11.9 | 2 | 0.1% |
| 12 | 10 | |
| 12.1 | 1 | < 0.1% |
| 12.4 | 3 | 0.1% |
| Value | Count | Frequency (%) |
| 35.6 | 4 | 0.1% |
| 33.5 | 7 | 0.2% |
| 33.4 | 1 | < 0.1% |
| 32.3 | 4 | 0.1% |
| 31.8 | 12 | |
| 31.5 | 9 | |
| 31.2 | 10 | |
| 30.5 | 5 | 0.1% |
| 28.4 | 18 | |
| 28.1 | 6 | 0.2% |
| Distinct | 24 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 41.13750715 |
| Minimum | 27 |
|---|---|
| Maximum | 80 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.8 KiB |
Quantile statistics
| Minimum | 27 |
|---|---|
| 5-th percentile | 32 |
| Q1 | 35 |
| median | 40 |
| Q3 | 45 |
| 95-th percentile | 60 |
| Maximum | 80 |
| Range | 53 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 8.212102131 |
|---|---|
| Coefficient of variation (CV) | 0.1996256628 |
| Kurtosis | 1.276642065 |
| Mean | 41.13750715 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 1.084564781 |
| Sum | 143899 |
| Variance | 67.43862141 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=24)
| Value | Count | Frequency (%) |
| 35 | 891 | |
| 37 | 471 | |
| 43 | 352 | 10.1% |
| 45 | 315 | 9.0% |
| 40 | 303 | 8.7% |
| 42 | 211 | 6.0% |
| 60 | 202 | 5.8% |
| 28 | 120 | 3.4% |
| 32 | 109 | 3.1% |
| 55 | 96 | 2.7% |
| Other values (14) | 428 |
| Value | Count | Frequency (%) |
| 27 | 50 | 1.4% |
| 28 | 120 | 3.4% |
| 32 | 109 | 3.1% |
| 35 | 891 | |
| 37 | 471 | |
| 40 | 303 | 8.7% |
| 41 | 10 | 0.3% |
| 42 | 211 | 6.0% |
| 43 | 352 | 10.1% |
| 44 | 37 | 1.1% |
| Value | Count | Frequency (%) |
| 80 | 2 | 0.1% |
| 71 | 1 | < 0.1% |
| 70 | 30 | 0.9% |
| 66 | 1 | < 0.1% |
| 65 | 5 | 0.1% |
| 62 | 3 | 0.1% |
| 60 | 202 | |
| 58 | 2 | 0.1% |
| 55 | 96 | |
| 52 | 94 |
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 198.3 KiB |
| 5 | |
|---|---|
| 7 | 115 |
| 4 | 22 |
| 6 | 6 |
| 8 | 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 3498 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 5 |
|---|---|
| 2nd row | 5 |
| 3rd row | 5 |
| 4th row | 5 |
| 5th row | 5 |
Common Values
| Value | Count | Frequency (%) |
| 5 | 3354 | |
| 7 | 115 | 3.3% |
| 4 | 22 | 0.6% |
| 6 | 6 | 0.2% |
| 8 | 1 | < 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 5 | 3354 | |
| 7 | 115 | 3.3% |
| 4 | 22 | 0.6% |
| 6 | 6 | 0.2% |
| 8 | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 3354 | |
| 7 | 115 | 3.3% |
| 4 | 22 | 0.6% |
| 6 | 6 | 0.2% |
| 8 | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 3498 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 3354 | |
| 7 | 115 | 3.3% |
| 4 | 22 | 0.6% |
| 6 | 6 | 0.2% |
| 8 | 1 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3498 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 5 | 3354 | |
| 7 | 115 | 3.3% |
| 4 | 22 | 0.6% |
| 6 | 6 | 0.2% |
| 8 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3498 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 5 | 3354 | |
| 7 | 115 | 3.3% |
| 4 | 22 | 0.6% |
| 6 | 6 | 0.2% |
| 8 | 1 | < 0.1% |
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.341909663 |
| Minimum | 0 |
|---|---|
| Maximum | 11 |
| Zeros | 216 |
| Zeros (%) | 6.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 8 |
| 95-th percentile | 11 |
| Maximum | 11 |
| Range | 11 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 3.544926982 |
|---|---|
| Coefficient of variation (CV) | 0.8164442047 |
| Kurtosis | -0.9970423984 |
| Mean | 4.341909663 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.6864048815 |
| Sum | 15188 |
| Variance | 12.5665073 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) |
| 3 | 777 | |
| 1 | 733 | |
| 10 | 398 | |
| 2 | 373 | |
| 0 | 216 | 6.2% |
| 11 | 214 | 6.1% |
| 8 | 196 | 5.6% |
| 4 | 190 | 5.4% |
| 6 | 123 | 3.5% |
| 9 | 120 | 3.4% |
| Other values (2) | 158 | 4.5% |
| Value | Count | Frequency (%) |
| 0 | 216 | 6.2% |
| 1 | 733 | |
| 2 | 373 | |
| 3 | 777 | |
| 4 | 190 | 5.4% |
| 5 | 104 | 3.0% |
| 6 | 123 | 3.5% |
| 7 | 54 | 1.5% |
| 8 | 196 | 5.6% |
| 9 | 120 | 3.4% |
| Value | Count | Frequency (%) |
| 11 | 214 | 6.1% |
| 10 | 398 | |
| 9 | 120 | 3.4% |
| 8 | 196 | 5.6% |
| 7 | 54 | 1.5% |
| 6 | 123 | 3.5% |
| 5 | 104 | 3.0% |
| 4 | 190 | 5.4% |
| 3 | 777 | |
| 2 | 373 |
| Distinct | 2883 |
|---|---|
| Distinct (%) | 82.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 637968.8928 |
| Minimum | 135099 |
|---|---|
| Maximum | 2790699 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 13.8 KiB |
Quantile statistics
| Minimum | 135099 |
|---|---|
| 5-th percentile | 291369 |
| Q1 | 427449 |
| median | 560349 |
| Q3 | 746711.5 |
| 95-th percentile | 1276104 |
| Maximum | 2790699 |
| Range | 2655600 |
| Interquartile range (IQR) | 319262.5 |
Descriptive statistics
| Standard deviation | 312140.1661 |
|---|---|
| Coefficient of variation (CV) | 0.4892717649 |
| Kurtosis | 3.242943785 |
| Mean | 637968.8928 |
| Median Absolute Deviation (MAD) | 150900 |
| Skewness | 1.597687542 |
| Sum | 2231615187 |
| Variance | 9.743148332 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 687199 | 6 | 0.2% |
| 342599 | 6 | 0.2% |
| 651799 | 5 | 0.1% |
| 603599 | 5 | 0.1% |
| 704199 | 5 | 0.1% |
| 877699 | 5 | 0.1% |
| 417399 | 5 | 0.1% |
| 420299 | 4 | 0.1% |
| 522899 | 4 | 0.1% |
| 453799 | 4 | 0.1% |
| Other values (2873) | 3449 |
| Value | Count | Frequency (%) |
| 135099 | 1 | |
| 141399 | 1 | |
| 152999 | 1 | |
| 166299 | 1 | |
| 167499 | 1 | |
| 170699 | 1 | |
| 171499 | 1 | |
| 174299 | 1 | |
| 181299 | 1 | |
| 185799 | 1 |
| Value | Count | Frequency (%) |
| 2790699 | 1 | |
| 2028299 | 1 | |
| 2012499 | 1 | |
| 2002499 | 1 | |
| 1997699 | 1 | |
| 1941699 | 1 | |
| 1930299 | 1 | |
| 1922799 | 1 | |
| 1921499 | 1 | |
| 1915899 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | Company | Model_name | Model_year | Kilometers | Owner | Transmission | Registration | State | Fuel_Type | Mileage | Fuel_capacity | Seating_capacity | City | Price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 8 | 12 | 2016 | 9666 | 0 | 1 | 166 | 6 | 1 | 24.7 | 35 | 5 | 11 | 323999 |
| 1 | 1 | 8 | 552 | 2017 | 66693 | 1 | 1 | 166 | 6 | 2 | 26.6 | 35 | 5 | 11 | 482399 |
| 2 | 2 | 8 | 551 | 2014 | 40532 | 1 | 1 | 158 | 6 | 1 | 20.5 | 35 | 5 | 11 | 367599 |
| 3 | 3 | 3 | 111 | 2015 | 60086 | 0 | 1 | 167 | 6 | 1 | 17.8 | 40 | 5 | 11 | 701799 |
| 4 | 4 | 8 | 93 | 2016 | 29544 | 0 | 1 | 166 | 6 | 1 | 20.7 | 43 | 5 | 11 | 682099 |
| 5 | 5 | 7 | 340 | 2019 | 49956 | 0 | 1 | 157 | 6 | 0 | 17.3 | 45 | 7 | 11 | 1153499 |
| 6 | 6 | 14 | 362 | 2018 | 49765 | 0 | 0 | 170 | 6 | 1 | 17.0 | 44 | 5 | 11 | 801899 |
| 7 | 7 | 14 | 364 | 2018 | 80038 | 0 | 0 | 166 | 6 | 0 | 17.9 | 44 | 5 | 11 | 803299 |
| 8 | 8 | 3 | 110 | 2018 | 46497 | 0 | 0 | 167 | 6 | 1 | 18.0 | 40 | 5 | 11 | 957799 |
| 9 | 9 | 8 | 172 | 2018 | 10340 | 0 | 1 | 166 | 6 | 1 | 21.2 | 37 | 5 | 11 | 699499 |
Last rows
| df_index | Company | Model_name | Model_year | Kilometers | Owner | Transmission | Registration | State | Fuel_Type | Mileage | Fuel_capacity | Seating_capacity | City | Price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3488 | 3491 | 12 | 337 | 2018 | 55408 | 0 | 1 | 191 | 9 | 1 | 23.0 | 28 | 5 | 6 | 337699 |
| 3489 | 3492 | 3 | 58 | 2012 | 77735 | 0 | 1 | 197 | 9 | 1 | 18.9 | 35 | 5 | 6 | 276499 |
| 3490 | 3493 | 12 | 336 | 2019 | 12427 | 0 | 1 | 211 | 9 | 1 | 25.0 | 28 | 5 | 6 | 429099 |
| 3491 | 3494 | 12 | 330 | 2018 | 45942 | 1 | 1 | 192 | 9 | 1 | 23.0 | 28 | 5 | 6 | 320999 |
| 3492 | 3495 | 8 | 51 | 2021 | 38321 | 0 | 1 | 211 | 9 | 1 | 21.4 | 37 | 5 | 6 | 699499 |
| 3493 | 3496 | 4 | 226 | 2016 | 38453 | 0 | 1 | 197 | 9 | 1 | 21.1 | 32 | 5 | 6 | 296599 |
| 3494 | 3497 | 16 | 383 | 2017 | 52422 | 1 | 1 | 197 | 9 | 1 | 16.5 | 45 | 5 | 6 | 554099 |
| 3495 | 3498 | 4 | 226 | 2016 | 40136 | 0 | 1 | 197 | 9 | 1 | 21.1 | 32 | 5 | 6 | 284499 |
| 3496 | 3499 | 8 | 236 | 2017 | 242614 | 0 | 1 | 192 | 9 | 0 | 24.5 | 45 | 7 | 6 | 706199 |
| 3497 | 3500 | 12 | 336 | 2020 | 42570 | 1 | 1 | 194 | 9 | 1 | 25.0 | 28 | 5 | 6 | 376299 |